Preserve Gmail labels in the mbox as sub-labels of the main mbox label.#32
Preserve Gmail labels in the mbox as sub-labels of the main mbox label.#32dotysan wants to merge 6 commits intogoogle:masterfrom
Conversation
dotysan
commented
Oct 31, 2017
- Also ignore messages in Trash/Spam.
- Don't bother creating Unread label.
- Also ignore messages in Trash/Spam. - Don't bother creating Unread label.
|
Thanks for your pull request. It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please visit https://cla.developers.google.com/ to sign. Once you've signed, please reply here (e.g.
|
|
I signed the CLA. |
|
CLAs look good, thanks! |
eesheesh
left a comment
There was a problem hiding this comment.
Hi,
Thanks for this! I've been wanting to add Google Takeout support for a while but never had the time.
I have quite a few changes I'd like to see before approving it. I understand that you may not have time to do it, so if you don't - that's fine, please let me know.
| import logging | ||
| import logging.handlers | ||
| import mailbox | ||
| from csv import reader |
There was a problem hiding this comment.
Please order imports alphabetically
| dest='with_labels', | ||
| required=False, | ||
| action='store_false', | ||
| help= |
There was a problem hiding this comment.
This is meant for MBOXes created by Google Takeout, right? In that case, I think it would be more helpful to say "Preserve Gmail labels from Takeout mbox files" and naming it "--takeout-labels".
Also, I think there's a lot of value in importing read/unread status and spam/trash for some people, so I'd suggest creating separate boolean flags for "--takeout-read-state" and "--takeout-spam-trash".
There was a problem hiding this comment.
Agreed. I was just lazy. Will implement these suggestions now.
| logging.info("Processing message %d in label '%s'", index, labelname) | ||
|
|
||
| if args.with_labels and 'X-Gmail-Labels' in message: | ||
| gmail_labels= next(reader([message['X-Gmail-Labels']])) |
There was a problem hiding this comment.
Won't this get only the first label?
There was a problem hiding this comment.
Also: Have you tested what happens if there's a label with a comma in its name in Gmail before the Takeout export?
There was a problem hiding this comment.
Yes. That's why I used from csv import reader. However in testing, I found that I forgot to handle a long list of labels that wraps the header. Will fix that now too.
|
|
||
| if args.with_labels and 'X-Gmail-Labels' in message: | ||
| gmail_labels= next(reader([message['X-Gmail-Labels']])) | ||
| #logging.info('Found Gmail Labels: %s', gmail_labels) |
There was a problem hiding this comment.
Why not log it? I think it's valuable.
There was a problem hiding this comment.
Me too! Just a bit chatty. I'll remove the comment.
| logging.info("Skipped Trash message %d in label '%s'", index, labelname) | ||
| continue | ||
| if 'Unread' in gmail_labels: | ||
| gmail_labels.remove('Unread') |
There was a problem hiding this comment.
See my note above in the flags - I think this should be controlled by the user (and specifically the unread label should be imported by default).
There was a problem hiding this comment.
OK. However, if I import the 'Unread' label as a sub-label, it doesn't work as expected. So I'll add the code/arg you suggested and make sure it is imported without the sub-label adjustment.
| if 'Unread' in gmail_labels: | ||
| gmail_labels.remove('Unread') | ||
|
|
||
| label_ids= [label_id] |
There was a problem hiding this comment.
Please make spacing consistent with the rest of the code (here and below).
| gmail_labels.remove('Unread') | ||
|
|
||
| label_ids= [label_id] | ||
| for sublabel in gmail_labels: |
There was a problem hiding this comment.
I don't think this will work with how you use next() above.
>>> from csv import reader
>>> gmail_labels= next(reader("a,b,c"))
>>> gmail_labels
['a']
>>> next(gmail_labels)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: list object is not an iterator
>>> for sublabel in gmail_labels:
... print sublabel
...
a
There was a problem hiding this comment.
It works for me. FYI, I actually did this:
>>> from csv import reader
>>> gmail_labels= next(reader(["a,b,c"]))
>>> gmail_labels
['a', 'b', 'c']
| label_ids.append(get_label_id_from_name(service, username, labels, sublabel)) | ||
| metadata_object= {'labelIds': label_ids} | ||
|
|
||
| # TODO: test/handle when there is no labels header? |
There was a problem hiding this comment.
Isn't this already handled by "if args.with_labels and 'X-Gmail-Labels' in message" above? What else needs to be tested?
There was a problem hiding this comment.
Oops. It was a note-to-self written before I did the deed. Should have removed it before the PR.
| logging.exception( | ||
| 'Failed to replace text/quoted-printable with text/plain ' | ||
| 'in Content-Type header') | ||
|
|
There was a problem hiding this comment.
Why are there new empty lines here and elsewhere?
There was a problem hiding this comment.
Was just for my own personal readability/style. I'll clean it up and make it consistent with yours.
| continue | ||
| logging.info("Processing message %d in label '%s'", index, labelname) | ||
|
|
||
| if args.with_labels and 'X-Gmail-Labels' in message: |
There was a problem hiding this comment.
This is quite long, maybe extract it into a separate function?
There was a problem hiding this comment.
Can do. I was just following existing convention in the code. I.e.
if args.fix_msgid and 'Message-ID' in message:
|
Here's what I've implemented: ...still hafta test that last one! |
eesheesh
left a comment
There was a problem hiding this comment.
Some minor changes requested. Have you tested all of the new options yet? (as well as with a "normal" mbox in a directory structure to ensure no regressions)
Thanks again for your help!
| dest='takeout_labels', | ||
| choices=[True, False, 'sublabels'], | ||
| required=False, | ||
| default=True, |
|
|
||
| if args.takeout_labels and 'X-Gmail-Labels' in msg: | ||
| gmail_labels = next(reader([msg['X-Gmail-Labels'].replace('\r\n', '')])) | ||
| logging.info('Found Gmail Labels: %s', gmail_labels) |
There was a problem hiding this comment.
Maybe "Found Gmail labels from Takeout"? Otherwise the user might not understand the context (if they don't remember this mbox came from Takeout, etc.), this would help differentiate it from finding a label in the sense of finding a new mbox file.
| logging.info('Found Gmail Labels: %s', gmail_labels) | ||
|
|
||
| if args.takeout_spam_trash: | ||
| # TODO: test if we are importing spam/trash without renaming to sublabels |
There was a problem hiding this comment.
Are we not testing that?
|
|
||
| for gmail_label in gmail_labels: | ||
| if args.takeout_labels == 'sublabels' and gmail_label != 'Unread': | ||
| gmail_label = "%s/%s" %(mbox_label_name, gmail_label) |
| logging.exception('Failed to fix brackets in Message-ID header') | ||
| metadata_object = {'labelIds': [label_id]} | ||
| metadata_object = get_metadata(service, username, labels, message, label_id, labelname) | ||
| if not metadata_object: continue |
There was a problem hiding this comment.
- Please put
continuein a new indented line. - When does this happen? Only if this is a spam/trash label we're skipping?
| raise | ||
|
|
||
|
|
||
| # TODO: instead of passing mbox_label_name as param, can't we look it up from id? |
There was a problem hiding this comment.
We can, but it would be less efficient (potentially much slower if you have many labels and many messages). I think we can drop this comment unless you feel strongly about it.
|
Hi @dotysan, please let me know if you don't have time to make the changes I requested so I can take it up myself if needed. :) Thanks! |
|
Unfortunately, I haven't had a chance to test much. Plus, I got hung up trying to decide what SYSTEM labels to honor when importing into a sublabel... For example, in the current PR, when importing In my un-pushed tree, I've also honored the STARRED and IMPORTANT system labels and subsequent status formatting. Still fiddling with it a bit more. Feedback? |
|
Hi, sorry for the delay in response, I was away. I think treating STARRED and IMPORTANT as system labels (i.e. not prefixing them) is good, and IMHO there's no need in adding a flag on whether to preserve them like read/unread. Thanks! |
|
So I'm finally back on this and testing/refining. And just stumbled across an annoying (redundant?) label that is inconsistently applied. https://productforums.google.com/forum/#!topic/gmail/eKveqdKxZDM Think I'll add the code to totally ignore "Archived" label since it's already implied with the absence of the "Inbox" label. Sound good? |
Yes, if there's no other label it will go to All Mail which is the same as archived. Thanks! |